12 research outputs found

    NSEmo at EmoInt-2017: An Ensemble to Predict Emotion Intensity in Tweets

    Get PDF
    In this paper, we describe a method to pre- dict emotion intensity in tweets. Our ap- proach is an ensemble of three regression methods. The first method uses content- based features (hashtags, emoticons, elon- gated words, etc.). The second method considers word n-grams and character n- grams for training. The final method uses lexicons, word embeddings, word n- grams, character n-grams for training the model. An ensemble of these three meth- ods gives better performance than individ- ual methods. We applied our method on WASSA emotion dataset. Achieved re- sults are as follows: average Pearson cor- relation is 0.706, average Spearman cor- relation is 0.696, average Pearson corre- lation for gold scores in range 0.5 to 1 is 0.539, and average Spearman correlation for gold scores in range 0.5 to 1 is 0.514

    Exploiting Meta Attributes for Identifying Event Related Hashtags

    Get PDF
    Users in social media often participate in discussions regarding different events happening in the physical world (e.g., concerts, conferences, festivals) by posting messages, replying to or forwarding messages related to such events. In various applications like event recommendation, event reporting, etc. it might be useful to find user discussions related to such events from social media. Finding event related hashtags can be useful for this purpose. In this paper, we focus on the problem of finding relevant hashtags for a given event. Features are defined to identify the event related hashtags. We specifically look for features that use similarities of the hashtags with the event metadata attributes. A learning to rank algorithm is applied to learn the importance weights of the features towards the task of predicting the relevance of a hashtag to the given event. We experimented on events from four different categories (namely, Award ceremonies, E-commerce events, Festivals, and Product launches). Experimental results show that our method significantly outperforms the baseline methods

    A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

    Full text link
    Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202

    Identification of Relevant Hashtags for Planned Events Using Learning to Rank

    No full text
    Lots of planned events (e.g. concerts, sports matches, festivals, etc.) keep happening across the world every day. In various applications like event recommendation, event reporting, etc. it might be useful to find user discussions related to such events from social media. Identification of event related hashtags can be useful for this purpose. In this paper, we focus on identifying the top hashtags related to a given event. We define a set of features for (event, hashtag) pairs, and discuss ways to obtain these feature scores. A linear aggregation of these scores is used to finally output a ranked list of top hashtags for the event. The aggregation weights of the features are obtained using a learning to rank algorithm. We establish the superiority of our method by performing detailed experiments on a large dataset containing multiple categories of events and related tweets

    Social Media Popularity Prediction of Planned Events Using Deep Learning

    No full text
    Early prediction of popularity is crucial for recommendation of planned events such as concerts, conferences, sports events, performing arts, etc. Estimation of the volume of social media discussions related to the event can be useful for this purpose. Most of the existing methods for social media popularity prediction focus on estimating tweet popularity i.e. predicting the number of retweets for a given tweet. There is less focus on predicting event popularity using social media. We focus on predicting the popularity of an event much before its start date. This type of early prediction can be helpful in event recommendation systems, assisting event organizers for better planning, dynamic ticket pricing, etc. We propose a deep learning based model to predict the social media popularity of an event. We also incorporate an extra feature indicating how many days left to the event start date to improve the performance. Experimental results show that our proposed deep learning based approach outperforms the baseline methods

    IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events

    No full text
    Retrieving relevant tweets corresponding to cultural events can be used in various applications like event reporting, event recommendation, etc. This type of retrieval is challenging due to short length of the tweet, noise, out of vocabulary words, abbreviations in the tweet. In this paper, we focus on the problem of retrieving relevant tweets related to given cultural event of a festival. We consider several factors like BM25, DFR, presence of artist name, relevant hashtag, festival name for finding the relevance of tweets to the event. We apply BM25 + DFR model to retrieve candidate set of tweets related to each event of a festival. We find the top hashtags for each event by exploring meta-attributes of an event. We re-rank the initial rank list from BM25 + DFR based on two strategies, namely, presence of the event meta-attributes (artist name, festival name, title, etc.) and the identified top hashtags in the tweet, and based on the timestamp of the event. We experimented on a subset of CLEF 2017 cultural microblog contextualization dataset. The experimental results show that the proposed method is able to put relevant tweets at the top of the retrieval list

    An Ensemble Based Method for Predicting Emotion Intensity of Tweets

    No full text
    Recently, user generated contents have increased tremendously in social media. Twitter is a popular micro-blogging platform in which users share their feelings, opinions, feedback, etc. It has been observed that microblogs are often associated with emotions. Several studies have focused on assigning a given tweet to one of the available emotion categories (e.g., anger, fear, joy, sadness). It is often useful in applications to find the intensity of emotion in the tweets. The focus on identifying emotion intensity is less in the literature. In this paper, we focus on determining the level of emotion intensity in the tweets. We use an ensemble of three methods: Convolution Neural Networks (CNN) with word embedding features, XGBoost with word n-gram and char n-gram features, and Support Vector Regression (SVR) with lexicon and word embedding features. The final prediction of the given tweet is obtained by the average of predictions of individual methods in the ensemble. The performance of ensemble is better than the methods in the ensemble due to diverse features. Our experimental results outperform baseline methods

    Aggression Detection in Social Media using Deep Neural Networks

    No full text
    With the rise of user-generated content in social media coupled with almost non-existent moderation in many such systems, aggressive contents have been observed to rise in such forums. In this paper, we work on the problem of aggression detection in social media. Aggression can sometimes be expressed directly or overtly or it can be hidden or covert in the text. On the other hand, most of the content in social media is non-aggressive in nature. We propose an ensemble based system to classify an input post to into one of three classes, namely, Overtly Aggressive, Covertly Aggressive, and Non-aggressive. Our approach uses three deep learning methods, namely, Convolutional Neural Networks (CNN) with five layers (input, convolution, pooling, hidden, and output), Long Short Term Memory networks (LSTM), and Bi-directional Long Short Term Memory networks (Bi-LSTM). A majority voting based ensemble method is used to combine these classifiers (CNN, LSTM, and Bi-LSTM). We trained our method on Facebook comments dataset and tested on Facebook comments (in-domain) and other social media posts (cross-domain). Our system achieves the F1-score (weighted) of 0.604 for Facebook posts and 0.508 for social media posts

    A reranking-based tweet retrieval approach for planned events

    No full text
    Twitter provides access to latest information. Whenever a major event happens, people try to search for event related information in social media platforms like Twitter. So, it is essential to develop methods to get good quality of event related tweets. People share different opinions, feelings, feedback, etc. about events happening around the world in Twitter in the form of tweets. These tweets are often short and contain noise. So, it is very difficult to get the most relevant data for a given event from Twitter. We propose a two-phase approach to retrieve the tweets related to planned events. In the first phase, initial retrieval is done by using BM25 algorithm. In the second phase, reranking is done by combining three scoring mechanisms namely BM25 score, top hashtags score related to an event, and top TF-IDF terms score related to an event. A learning to rank algorithm SVM_Rank is applied to give weights to these three methods and combine them to get the final score of the tweet. We performed experiments on two benchmark datasets CLEF and TREC. Experimental results show that our method outperforms baseline and literature methods for both the datasets according to multiple evaluation metrics. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature

    A Neural Network-Based Ensemble Approach for Spam Detection in Twitter

    No full text
    As the social networking sites get more popular, spammers target these sites to spread spam posts. Twitter is one of the most popular online social networking sites where users communicate and interact on various topics. Most of the current spam filtering methods in Twitter focus on detecting the spammers and blocking them. However, spammers can create a new account and start posting new spam tweets again. So there is a need for robust spam detection techniques to detect the spam at tweet level. These types of techniques can prevent the spam in real time. To detect the spam at tweet level, often features are defined, and appropriate machine learning algorithms are applied in the literature. Recently, deep learning methods are showing fruitful results on several natural language processing tasks. We want to use the potential benefits of these two types of methods for our problem. Toward this, we propose an ensemble approach for spam detection at tweet level. We develop various deep learning models based on convolutional neural networks (CNNs). Five CNNs and one feature-based model are used in the ensemble. Each CNN uses different word embeddings (Glove, Word2vec) to train the model. The feature-based model uses content-based, user-based, and n-gram features. Our approach combines both deep learning and traditional feature-based models using a multilayer neural network which acts as a meta-classifier. We evaluate our method on two data sets, one data set is balanced, and another one is imbalanced. The experimental results show that our proposed method outperforms the existing methods. IEE
    corecore